Skip to content

feat: add ring-buffer scaffold for memory profiling#428

Open
rcoh wants to merge 2 commits into
mainfrom
feat/memory-profiling-ring-source
Open

feat: add ring-buffer scaffold for memory profiling#428
rcoh wants to merge 2 commits into
mainfrom
feat/memory-profiling-ring-source

Conversation

@rcoh
Copy link
Copy Markdown
Contributor

@rcoh rcoh commented May 19, 2026

Adds pub(crate) mod memory_profiling (gated on the new memory-profiling cargo feature) with:

  • RawAlloc<MAX_FRAMES> and RawFree fixed-size POD records.
  • RingBuffers<MAX_FRAMES> wrapping two crossbeam_queue::ArrayQueues — alloc queue (default 4096 slots × 1056 B = ~4 MiB at 128 frames) and free queue (default 32K slots × 32 B = ~1 MiB; 8× the alloc slot count per design §9 because frees are pushed unconditionally and allocs only when sampled).
  • MemoryProfileSource<MAX_FRAMES> implementing the existing Source trait, draining both queues each flush cycle and emitting AllocEvent / FreeEvent into the trace via the consolidator's ThreadLocalEncoder.

Drain is timestamp-ordered: at each step we emit whichever queue head has the older timestamp. Naive 'all allocs, then all frees' would corrupt the liveset under address reuse within a single flush cycle (alloc → free → alloc-with-same-addr); see the address_reuse_within_flush_cycle_preserves_liveset test.

Liveset tracking (HashMap<u64, LivesetEntry>) is wired in but only activates when MemoryProfileSource is constructed with MemoryProfileSource::new(rings, true).

No allocator hook yet (later commit). Tests push synthetic RawAllocs and RawFrees directly into the queues and verify AllocEvents and FreeEvents come out the other end.

Adds `pub(crate) mod memory_profiling` (gated on the new
`memory-profiling` cargo feature) with:
- RawAlloc<MAX_FRAMES> and RawFree fixed-size POD records.
- RingBuffers<MAX_FRAMES> wrapping two `crossbeam_queue::ArrayQueue`s
  — alloc queue (default 4096 slots × 1056 B = ~4 MiB at 128 frames)
  and free queue (default 32K slots × 32 B = ~1 MiB; 8× the alloc
  slot count per design §9 because frees are pushed unconditionally
  and allocs only when sampled).
- MemoryProfileSource<MAX_FRAMES> implementing the existing Source
  trait, draining both queues each flush cycle and emitting
  AllocEvent / FreeEvent into the trace via the consolidator's
  ThreadLocalEncoder.

Drain is timestamp-ordered: at each step we emit whichever queue
head has the older timestamp. Naive 'all allocs, then all frees'
would corrupt the liveset under address reuse within a single flush
cycle (alloc → free → alloc-with-same-addr); see the
`address_reuse_within_flush_cycle_preserves_liveset` test.

Liveset tracking (HashMap<u64, LivesetEntry>) is wired in but only
activates when MemoryProfileSource is constructed with
`MemoryProfileSource::new(rings, true)`.

No allocator hook yet (later commit). Tests push synthetic RawAllocs
and RawFrees directly into the queues and verify AllocEvents and
FreeEvents come out the other end.
@rcoh rcoh requested a review from yulnr May 19, 2026 11:58
- Add compile-time assertion that MAX_FRAMES <= u8::MAX (frame_count field)
- Replace frames().to_vec() with direct slice of destructured array to
  avoid a heap allocation per drain in handle_alloc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant